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Abstract 

The feedback capacity of the stationary Gaussian additive noise channel has been 
open, except for the case where the noise is white. Here we find the feedback capacity 
of the stationary first-order moving average additive Gaussian noise channel in closed 
form. Specifically, the channel is given by Yi = Xi + Zi, i = 1,2,... , where the 
input {Xi} satisfies a power constraint and the noise {Zi} is a first-order moving 
average Gaussian process defined by Zi = aUi-i + Ui, \a\ < 1, with white Gaussian 
innovations Ui, i = 0,1, . . . . 

We show that the feedback capacity of this channel is — logxo, where xq is the 
unique positive root of the equation px^ = {\ — x'^){l — \a\x)'^, and p is the ratio of the 
average input power per transmission to the variance of the noise innovation Ui. The 
optimal coding scheme parallels the simple linear signalling scheme by Schalkwijk 
and Kailath for the additive white Gaussian noise channel — the transmitter sends 
a real-valued information-bearing signal at the beginning of communication and sub- 
sequently refines the receiver's error by processing the feedback noise signal through 
a linear stationary first-order autoregressive filter. The resulting error probability of 
the maximum likelihood decoding decays doubly-exponentially in the duration of the 
communication. This feedback capacity of the first-order moving average Gaussian 
channel is very similar in form to the best known achievable rate for the first-order 
autoregressive Gaussian noise channel studied by Butman, Wolfowitz, and Tiernan, 
although the optimality of the latter is yet to be established. 



Index Terms — Additive Gaussian noise channels, capacity, feedback, feedback capacity, 
first-order moving average, Gaussian feedback capacity, linear signalling. 
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1 Introduction and Summary 



Consider the additive Gaussian noise channel with feedback as depicted in Figure 1. The 
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Figure 1: Gaussian channel with feedback. 



W = Wr,(Y") 



channel 1^ = Xj + Zj, i = 1,2,..., has additive Gaussian noise Zi, Z2, . . . , where = 
(Zi, . . . , Zn) ~ Nn{0, Kz). We wish to communicate a message WE {1,2,..., 2"-^} reliably 
over the channel = + The channel output is causally fed back to the transmitter. 
We specify a {2''^,n) code with the codewords^ iXi{W), X2{W,Yi), Xn{W,Y''-^)) 
satisfying the expected power constraint 



and decoding function Wn 
by 



1 

E-J2X^{W,Y^~')<P (1) 

i=l 

— s> {1, 2, ... , 2"^}. The probability of error Pi"^ is defined 
Pi") := Pr{#„(F") ^ W} 



where the message W is independent of and is uniformly distributed over {1,2,..., 2"^}. 
We call the sequence {C^jFeI^i " Ti-block feedback capacity sequence if for every e > 0, 
there exists a sequence of (2'^^^"'™~'^\n) codes with Pi^^ — as n — > 00, and for every 
e > and any sequence of codes with 2"'^'-^'^''^^'^'^'' codewords, Pi^^ is bounded away from 
zero for all n. We define the feedback capacity Cfb as 

CpB •= lim Cn,FB 

if the limit exists. This definition of feedback capacity agrees with the usual operational 
definition for the capacity of memoryless channels without feedback as the supremum of 
achievable rates 



In Cover and Pombra characterized the n-block feedback capacity C„,fb as 

1 det(iry) 

C„,FB = max —log (2) 



^More precisely, encoding functions Xi : {1, . . . , 2"^} 



1,1 = 1,2,. 
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Here Kx = Kx{n), Ky = Kyin) and Kz = Kziji) respectively denote the covariance 
matrices of X", and Z", and the maximization is over all X" of the form X" = BZ"'+V"' 
with a strictly lower-triangular n x n matrix B = B{n) and multivariate Gaussian V"^ 
independent of such that E ^^^^ X"^ = tr Kx < nP. Equivalently, we can rewrite 
as 

„ 1 , det{{B + I)Kz{B + lf + Kv) 

= "^^^B ^ (3) 

where the maximization is over all nonnegative definite n x n matrices Ky = Ky{n) and 
strictly lower triangular n x n matrices B = B{n) such that ii{BKzB^ + Ky) < nP. 

When the noise process {Zn} is stationary, the n-block capacity is super-additive in 
the sense that 

n C„,FB + m Crn,FB < {u + m) Cn+m,FB, for all 71, m = 1 , 2, . . . . 

Consequently, the feedback capacity Cps is well-defined (see, for example, Polya and 
Szego [3]) as 

CpB = hm C„^FB 

n— >oo 

r 1 1 det{{B + I)Kz{B + lY + Ky) 
= hm max — log 7 r — -. (4) 

n^oo B{n),Kv{n) 2n det{Kz) 

To obtain a closed-form expression for the feedback capacity CpB, however, we need to go 
further than (0} since the above characterization does not give any hint on the sequence of 
the optimal {B{n), Ky{n))'^^i achieving C„.fb or more importantly, its limiting behavior. 

In this paper, we study in detail the case where the additive Gaussian noise process 
is a moving average process of order one (MA(1)). We define the Gaussian MA(1) 
noise process with parameter a, \a\ < 1, as 

Z, = a + U, (5) 

where {t/jj^g is a white Gaussian innovation process. Without loss of generality, we will 
assume that t/j, i = 0,1,..., has unit variance. There are alternative ways of defining 
Gaussian MA(1) processes, which we will review in Section |21 

Note that the condition |a| < 1 is not restrictive. When \a\ > 1, it can be readily 
verified that the process {Zi} has the same distribution as the process {Zi} defined by 

= «(/?f/,_i + f/,) 

where the moving average parameter f3 is given by /? = thus giving \/3\ < 1. 
We state the main theorem, the proof of which will be given in Sectional 
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Theorem 1. For the additive Gaussian MA(1) noise channel Yi = Xi + Zi, i = 1,2, . . . , 
with the Gaussian MA(1 ) noise process [Zi] defined in the feedback capacity Cfb under 
the power constraint Yl^=i ^-^f — given by 

Cfb = -\ogxo, 

where xq is the unique positive root of the fourth- order polynomial 

Px^ = {l-x'^){l-\a\xf. (6) 

As will be shown later in Sections 3 and 4, the feedback capacity Cfb is achieved by 
an asymptotically stationary ergodic input process {Xi} satisfying EXf = P for all i. 
Thus by ergodic theorem, the feedback capacity does not diminish under a more restrictive 
power constraint 

n 

-J2xKw,Y^-')<p 

1=1 

(See also the arguments given in [21 Section VIII] based on the stationarity of the noise 
process.) 

The literature on Gaussian feedback channels is vast. We first mention some prior 
work closely related to our main discussion. In earlier work, Schalkwijk and Kailath |31 
(see also the discussion by Wolfowitz [Bj) considered the feedback over the additive white 
Gaussian noise channel, and proposed a simple linear signalling scheme that achieves the 
feedback capacity. The coding scheme by Schalkwijk and Kailath can be summarized as 
follows: Let 6 be one of 2"^ equally spaced real numbers on some interval, say, [0, 1]. At 
time k, the receiver forms the maximum likelihood estimate 6k(Yi, . . . , Y^) of 6. Using the 
feedback information, at time k + l,we send X^+i = 7^(^ — 6'^), where 7^ is a scaling factor 
properly chosen to meet the power constraint. After n transmissions, the receiver finds 
the value of 6 among 2"^ alternatives that is closest to On- This simple signalling scheme, 
without any coding, achieves the feedback capacity. As is shown by Shannon [7], feedback 
does not increase the capacity of memoryless channels. (See also Kadota et al. jHl El for 
continuous cases.) The benefit of feedback, however, does not consist of the simplicity of 
coding only. The probability of decoding error of the Schalkwijk-Kailath scheme decays 
doubly exponentially in the duration of communication, compared to the exponential decay 
for the nonfeedback scenario. In fact, there exists a feedback coding scheme such that the 
probability of decoding error decreases more rapidly than the exponential of any order ^01 
Later Schalkwijk extended his work to the center-of-gravity information feedback 
for higher dimensional signal spaces [T^ . 

Butman J3] generalized the linear coding scheme of Schalkwijk and Kailath for white 
noise processes to autoregressive (AR) noise processes. For first-order autoregressive 
(AR(1)) processes {Zj}^^ with regression parameter a, \a\ < 1, defined by 

Zi = aZi^i + Ui 



4 



he obtained a lower bound on the feedback capacity as — logxo, where Xq is the unique 
positive root of the fourth-order polynomial 

(l + |a|iO"' ^' 

This rate has been shown to be optimal among a certain class of linear feedback schemes 
by Wolfowitz jT3j and Tiernan [1^] and is strongly believed to be the capacity of the 
AR(1) feedback capacity. Tiernan and Schalkwijk ^Tj found an upper bound of the AR(1) 
feedback capacity, which meets Butman's lower bound for very low and very high signal-to- 
noise ratio. Butman ^H] also obtained capacity upper and lower bounds for AR processes 
with higher order. 

For the case of moving average (MA) noise processes, there are far fewer results in the 
literature, although MA processes are usually more tractable than AR processes of the same 
order. Ozarow 1201 gave upper and lower bounds of the feedback capacity for AR(1) 
and MA(1) channels and showed that feedback strictly increases the capacity. Substantial 
progress was made by Ordentlich j25 ; he observed that Ky in Q is at most of rank k for a 
MA noise process with order k. He also showed that the optimal [Ky, B) necessarily has the 
property that the current input signal Xk is orthogonal to the past outputs (Yi, . . . , Yk-i). 
For the special case of MA(1) processes, this development, combined with the arguments 
given in jT^j, suggests that a linear signalling scheme similar to the Schalkwijk-Kailath 
scheme be optimal, which is proved by our Theorem 1. 

A recent report by Yang, Kavcic, and Tatikonda (see also Yang's thesis [22]) studies 
the feedback capacity of the general ARMA(A;) case using the state-space model and offers 
a conjecture on the feedback capacity as a solution to an optimization problem that does 
not depend on the horizon n. For the special case k = 1 with the noise process {Zi}^^ 
defined by 

= /5 + a f/,_i + f/i, |a|,|/3|<l 

they conjecture that the Schalkwijk-Kailath-Butman scheme is optimal. The corresponding 
achievable rate can be written in a closed form as — logxo, where xq is the unique positive 
root of the fourth-order polynomial 

2 ^ {I - x^){l - aaxf 
^ (1 + a(3xf 

and 

_ f 1, a + > 0, 
^ \ -1, a + < 0. 

By taking /3 = or a = 0, we can easily recover © and (j?!), respectively. Thus, in the 
special case /? = 0, our Theorem 1 confirms the Yang-Kavcic- Tatikonda conjecture. 

To conclude this section, we review, in a rather incomplete manner, previous work on 
the Gaussian feedback channel in addition to aforementioned results, and then point out 
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where the current work hes in the hterature. The standard hterature on the Gaussian feed- 
back channel and associated simple feedback coding schemes traces back to a 1956 paper 
by Elias ^ and its sequels [23 123- Turin (2312111211, Horstein jHOl, Khas'minskii [211, 
and Ferguson (321 studied a sequential binary signalling scheme over the Gaussian feed- 
back channel with symbol-by- symbol decoding that achieves the feedback capacity with 
an error exponent better than the nonfeedback case. As mentioned above, Schalkwijk and 
Kailath jH El El made a major breakthrough by showing that a simple linear feedback 
coding scheme achieves the feedback capacity with doubly exponentially decreasing prob- 
ability of decoding error. This fascinating result has been extended in many directions. 
Omura [321 reformulated the feedback communication problem as a stochastic-control prob- 
lem and applied this approach to multiplicative and additive noise channels with noiseless 
feedback and to additive noise channels with noisy feedback. Pinsker ^U], Kramer [TTj . 
and Zigangirov [T21 studied feedback coding schemes under which the probability of decod- 
ing error decays as the exponential of arbitrary high order. Wyner [33] and Kramer 
studied the performance of the Schalkwijk-Kailath scheme under a peak power constraint 
and reported the singly exponential behavior of the probability of decoding error under a 
peak power constraint. The actual error exponent of the Gaussian feedback channel under 
the peak power constraint was later obtained by Schalkwijk and Barron [H^l- Kashyap [3B] . 
Lavenberg [371 EHl and Kramer ^I] looked at the case of noisy or intermittent feedback. 

The more natural question of transmitting a Gaussian source over a Gaussian feedback 
channel was studied by Kailath [311, Cruise [101, Schalkwijk and Bluestein [3J, Ovsee- 
vich [121, Ihara [131. There are also many notable extensions of the Schalkwijk-Kailath 
scheme in the area of multiple user information theory. Using the Schalkwijk-Kailath 
scheme, Ozarow and Leung- Yan-Cheong [H] showed that feedback increases the capac- 
ity region of stochastically degraded broadcast channels, which is rather surprising since 
feedback does not increase the capacity region of physically degraded broadcast channels, 
as shown by El Gamal [121. Ozarow [IHl also established the feedback capacity region of 
two-user white Gaussian multiple access channel through a very innovative application of 
the Schalkwijk-Kailath coding scheme. The extension to a larger number of users was at- 
tempted by Kramer [IT] , where he also showed that feedback increases the capacity region 
of strong interference channels. 

Following these results on the white Gaussian noise channel on hand, the next focus 
was on the feedback capacity of the colored Gaussian noise channel. Butman [21 El 
extended the Schalkwijk-Kailath coding scheme to autoregressive noise channels. Subse- 
quently, Tiernan and Schalkwijk [TTl UHl, Wolfowitz [T31, Ozarow [THl [201, Dembo [5Uj . 
and Yang et al. [221 studied the feedback capacity of finite-order ARMA additive Gaussian 
noise channels and obtained many interesting upper and lower bounds. Using an asymp- 
totic equipartition theorem for nonstationary nonergodic Gaussian noise processes. Cover 
and Pombra [21 obtained the n-block capacity Q for the arbitrary colored Gaussian chan- 
nel with or without feedback. (We can take 5 = in Q for the nonfeedback case.) Using 
matrix inequalities, they also showed that feedback does not increase the capacity much; 
namely, feedback increases the capacity at most twice (a result obtained by Pinsker [IHl 
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and Ebert [IH]); and feedback increases the capacity at most by half a bit. 

The extensions and refinements of the resuh by Cover and Pombra abound. Dembo jSHl 
showed that the feedback does not increase the capacity at very low signal-to-noise ratio 
or very high signal-to-noise ratio. As mentioned above, Ordentlich [21] examined the 
properties of the optimal solution {Ky, B) in Q and found the rank condition of Ky for 
finite-order MA noise processes. Chen and Yanagi jHU E21 US] studied Cover's conjec- 
ture [HI] that the feedback capacity is at most as large as the nonfeedback capacity with 
twice the power, and made several refinements on the upper bounds by Cover and Pom- 
bra. Thomas [53], Pombra and Cover (HHl; and Ordentlich jSTj extended the factor-of-two 
bound result to the colored Gaussian multiple access channels with feedback. Recently 
Yang, Kavcic, and Tatikonda [221 revived the control-theoretic approach (cf. [SS]) to the 
stationary ARMA(A;) Gaussian feedback capacity problem. Although one-sentence sum- 
mary would not do justice to their contribution, Yang et al. reformulated the feedback 
capacity problem as a stochastic control problem and used dynamic programming for the 
numerical computation of the n-block feedback capacity. In a series of papers |3H1 1^ IHUj. 
Ihara obtained coding theorems for continuous-time Gaussian channels with feedback and 
showed that the factor-of-two bound on the feedback capacity is tight by considering clev- 
erly constructed nonstationary channels both in discrete time [HJ and continuous time [HUj . 
(See also [H21 Examples 5.7.2 and 6.8.1].) In fact, besides the white Gaussian noise channel, 
Ihara' s example is the only nontrivial channel with known closed- form feedback capacity. 

Hence Theorem 1 provides the first feedback capacity result on stationary colored Gaus- 
sian channels. Moreover, as will be discussed in Section 4, a simple linear signalling scheme 
similar to the Schalkwijk-Kailath scheme achieves the feedback capacity. This result links 
the Cover-Pombra formulation of the feedback capacity with the Schalkwijk-Kailath scheme 
and its generalizations to stationary colored channels, and provides new hope for the op- 
timality of the achievable rate for the AR(1) channel obtained by Butman [T^ . 



2 First-Order Moving Average Gaussian Processes 

In this section, we digress a little to review a few characteristics of first-order moving 
average Gaussian processes. First, we give three alternative characterizations of Gaussian 
MA(1) processes. As defined in the previous section, the Gaussian MA(1) noise process 
{Zj}^^ with parameter a can be characterized as 

Zi = a + Ui, (8) 

where the innovations Uq, Ui, . . . are i.i.d. ~ A^(0, 1). 

We reinterpret the above definition in ([S)) by regarding the noise process {Zi} as the 
output of the linear time-invariant filter with transfer function 

H{z) = l + az-\ (9) 
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which is driven by the white innovation process {Ui}. Thus we alternatively characterize 
the Gaussian MA(1) noise process {Zi} with parameter a and unit innovation through its 
power spectral density Sz{uj) given by 

Sz{u) = |1 + ae^^'^p = + 2acosuj. (10) 



We can further identify the power spectral density 5*^(0;) with the infinite Toeplitz 
covariance matrix of a Gaussian process. Thus, we can define {Zi} as ~ 
A^„(0, Kz) for each finite horizon n where Kz is tri-diagonal with 



a 






a 
a 





a 







a 



or equivalently, 








a 




[1+^2, 






= 0, 


! «, 






= 1, 


I 0, 






> 2. 



Note that this covariance matrix Kz is consistent with our initial definition of the MA(1) 
process given in (jH)). Thus all three definitions of the MA(1) process given above are 
equivalent. As we will see in the next section, the special structure of the MA(1) process, 
especially the tri-diagonality of the covariance matrix, makes the maximization in ^ easier 
than the generic case. 

We will need the entropy rate of the MA(1) Gaussian process later in our discussion. 
As shown by Kolmogorov (see ^ Section 11.6]), the entropy rate of a stationary Gaussian 
process with power spectral density S{uj) can be expressed as 

1 r 

— / log(2TieS(u))du. 

We can calculate the above integral with the power spectral density Sz{uj) in (llUj) by 
Jensen's^ formula ESI Theorem 15.181 



1 

2^ 





r 0, 




a\duj = < 


. log|a|. 









:iii 



^The same J. L. W. V. Jensen famous for his inequality on convex functions. 
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and obtain the entropy rate of the MA(1) Gaussian process (jH)) as 



[ log {2TTeSz(uj)) duj = ^ [ log (2ne\l + ae"^'^n du 
4vr J_„ 47r 

i log(27re), |a| < 1, 

(12) 

I log(27reQ;^), |a| > 1. 

(One can alternatively deal with the determinant of Kz{n) directly by a simple recursion. 
For example, we can show that deti^'^(n) = n + 1 for |a| = 1.) For a more general 
discussion of the entropy rate of stationary Gaussian processes, refer to jHS Chapter 2]. 

We finish our digression by noting a certain reciprocal relationship between the Gaussian 
MA(1) process with parameter a and the Gaussian AR(1) process with parameter —a. We 
can define the Gaussian AR(1) process {Zj}^^ with parameter —a, |a| < 1, as 

Zi = -aZi^i + f/j, 

where the innovations Ui, U2, ■ ■ ■ are i.i.d. ~ N{0, 1) and Zq ~ N{0, 1/(1 — a^)) is indepen- 
dent of {f/j}^^. Equivalently, we can define the above process as the output of the linear 
time-invariant filter with transfer function 

G{z) 



1 + H{zy 

where H{z) is the transfer function Q of the MA(1) process with parameter a. This 
reciprocity is indeed reflected in the striking similarity between the fourth-order polynomial 
dUj) for the capacity of the Gaussian MA(1) noise channel and the fourth-order polynomial 
((7j) for the best known achievable rate of the Gaussian AR(1) noise channel. 



3 Proof of Theorem 1 



We will first transform the optimization problem 

C„,FB = max — log A ,(T^ \ ® 

Kv,B2n det[Kz) 

to a series of (asymptotically) equivalent forms. Then we solve the problem by imposing 
individual power constraints (Pi, . . . , P„) on each input signal. Subsequently we optimize 
over (Pi, . . . , P„) under the average power constraint 

Pi + ■■■ + Pn<nP. 

Then using Lemma El we will prove that the uniform power allocation Pi = ■ ■ ■ = P„ = P 
is asymptotically optimal. This leads to a closed- form solution given in Theorem 1. 
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Step 1. Transformations into equivalent optimization problems. 



Recall that we wish to solve the optimization problem: 

maximize log det((5 + I)Kz{B + J)^ + Ky] 



(13) 



over all nonnegative definite Ky and strictly lower triangular B satisfying ii{BKzB^ + 
Ky) < nP. We approximate the covariance matrix Kz of the given MA(1) noise process 
with parameter a by another covariance matrix K'^. Define K'^ = HzH^ where the lower- 
triangular Toeplitz matrix Hz is given by 



1 
a 1 
a 





1 



a 








1 



This matrix K'^ is a covariance matrix of the Gaussian process {^i}^o defined by 



Zi = Ui + aUi_i, i 



2,3, 



where {Ui}°l-^ is the white Gaussian process with unit variance. It is easy to check that 
Kz ^ K'^ (i.e., Kz — K'^ is nonnegative definite) and that the difference between Kz and 
K'z is given by 



K.-K' 



a , z = j = 1, 
0, otherwise. 



It is intuitively clear that there is no asymptotic difference in capacity between the channel 
with the original noise covariance Kz and the channel with noise covariance K'^. We will 
prove this claim more rigorously in the Appendix. Throughout we will assume that the 
noise covariance matrix of the given channel is Z^^, which is equivalent to the statement 
that the time- zero noise innovation Uq is revealed to both the transmitter and the receiver. 

Now by identifying Ky = FyFy for some lower-triangular Fy and identifying Fz = 
BHz for some strictly lower-triangular Fz, we transform the optimization problem (jl3p 
into 



maximize 
subject to 

with new variables {Fy,Fz). 



logdet(FyF^ + {Fz + Hz)iFz 
tT{FyF^ + FzF^) < nP 



We shall use 2n-dimensional row vectors fi and hi 



n, 



(14) 



to denote the i-th 



row of F := [Fy Fz] and H := [Onxn Hz], respectively. There is an obvious identification 
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between the time-i input signal Xj and the vector /j, i = 1, . . . ,n, for we can regard fi 
as a point in the Hilbert space with the innovations of V"' and as a basis. We can 
similarly identify Zi with hi and identify Yi with fi + hi. We also introduce new variables 
(Pi, . . . ,Pn) representing the power constraint for each input fi. Now the optimization 
problem in p4p becomes the following equivalent form: 



maximize log det{{F + H){F + H)'^) 
subject to II /i IP < -Pi, i = l,...,n, 
Er=i P^ < nP. 



(15) 



Here || ■ || denotes the Euclidean norm of a 2n-dimensional vector. Note that the variables 
/i, . . . , /„ should satisfy /i G V^, z = 1, . . . , n, where 



Vi := {{vi, . . .,V2n) e 



r>2n 



: Vi 



i+1 



= Vn+i 



V2n}- 



Step 2. Optimization under the individual power constraint for each signal. 

We solve the optimization problem (fT^ in (/i, . . . , /„) after fixing (Pi, . . . , P„). This 
step is mostly algebraic, but we can easily give a geometric interpretation. We need some 
notation first. 



We define an n-by-2n matrix 



S 



and we define the n-hj-2n matrix E by 



Sl 




' fi + h' 






_ fn + hn _ 



F + H, 



E 



ei 



. Onxn I ] ) 



where / is identity. We also define an n-hj-2n matrix 



G 



91 




hi - ei 






hn Cn 



H-E. 



We can interpret the row vector as the noise innovation Ui and the row vector Qi as 
Zi - Ui. 
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We will use the notation to denote the k-hy-2n submatrix of F which consists of 
the first k rows of F, that is, 

" /i 

fk 

We will use the similar notation for the k-hj-2n submatrices of G, H, E, and 5*. 

We now introduce a sequence of 2n-by-2n matrices {n^}^"! as 

U, = I-SUSkS^)-'S,. 

Observe that Sk is of full rank and thus that {SkS'^)~^ always exists. We can view 11^ as a 
map of a 2n-dimensional row vector (acting from the right) to its component orthogonal to 
the subspace spanned by the rows Si, . . . ,Sk of Sk- (Or 11^ maps a generic random variable 
A to A- E{A\Y'').) It is easy to verify that Uk = = n^Efc and UkS^ = 0. 

Finally we define the intermediate objective functions of the maximization (jl5p as 
Jfc(Pi, ■ ■ ■ ,Pk) ■= max logdetfS'i.S',!'), k = 1, 



max logdet(5'fc5', 

/ivi/fe 



so that 



Cn,FB = max ^Jn{Pi, ...,Pn)- 



We will show that if (/i*, . . . , fl_^) maximizes Jfc_i(Pi, . . . , Pfc-i), then (/*, . . . , /*) 
maximizes Jk{Pi, ■ ■ ■ ,Pk) for some satisfying = n^.i. Thus the maximization for 
Jn can be solved in a greedy fashion by sequentially maximizing Ji, J2, . . . , J„ through 
, . . . , /*. Furthermore, we will obtain the recursive relationship 

Jo:=0, (16) 
Ji = log(l + Pi), (17) 



Jfc+i = Jfc + log I 1 + I ^/Pi^i + \a\Jl- 1 |, fc = l,2,.... 




We need the following result to proceed to the actual maximization. 

Lemma 1. Suppose P > and 1 < k < n — 1. Suppose Sk and 11^ defined as above. Let 
V he an arbitrary subspace of M^'" such that V is not contained in the span of si, . . . , Sk- 
Then, for any w E V, 

max {v + w)Uk{v + wf = {VP +\\wUk\\Y. 

v&^:\\v\\^<P 
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Furthermore, ifwUk ^ 0, the maximum is attained by 



P- 



(19) 



Proof. When wUk = 0, that is, w G spanjsi, . . . , Sk}, the maximum of (v+w) Ukiv+w)'^ = 
vllkV^ is attained by any vector v, ||f||^ = P, orthogonal to spanjsi, . . . , 5^}, and we 
trivially have 

max V HkV^ = P- 

i;eV:||i>||2<P 



When w Ilfc 7^ 0, we have 

[v + w)Iik{v + wf = \\{v + wUk)Uk\\^ 

< \\v + wUk\\^ 

< {VP+\\wUk\\)\ 

where the first inequality follows from the fact that / — 11^ is nonnegative definite. It is 
easy to check that we have equality if v is given by (jl9j) . □ 



We observe that, for = 2, 
det{SkSl) = det 











Sk 




Sk 





det 



Sk-iSk-i Sk^is"^ 



sksr 



k-l 



T 
SkSk 



det(S'fe_iS'J_i) ■ Skil - S'J_i(S'fe_iS'J„i) ^Sk^i)s 



T 



det{Sk^iSf._^) ■ SkUk-iS 
det{Sk^iSl_{) ■ {fk + Qk + Cfc) nfe-i(/fc + gk + ekf 
det(5fc_iCi) ■ [1 + Uk + 9k) Tik^i{fk + 9kf] 



(20) 
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where follows since n^.i = e^, e^e^ = 1, and Ckgl = Cfc/J = 0. Now fix /i, ... , fk-i. 
Since is not contained in spanjsi, . . . , Sk-i} and gk G Vk, we have from the above lemma 
and ^ that 

max det{SkSl) = det{Sk-iSl_,) ■ (l + (v^+ WokUk-iWY] . (21) 

/fc:||/fcll <-Pfe V ^ ^ / 

If a 7^ 0, the maximum of is attained by 

n = v^r^^- (22) 

||5'fcfffc-i|| 

In the special case a = 0, that is, when the noise is white, we trivially have 

max det{SkS^) = det(5,._iCi) " (1 + Pk), 

fk:\\hP<Pk 

which immediately implies that Jk = Jk-i +log(l + Pfc) = Yli=i log(l + -P*); which, in turn, 
combined with the concavity of the logarithm, implies that 

Cn,FB = CFB = ^log(l + P). 



We continue our discussion throughout this step under the assumption a ^ 0. Until 
this point we have not used the special structure of the MA(1) noise process. Now we rely 
heavily on it. We trivially have 

Ji = maxlog(sisf ) = log(l + Pi), (23) 

/i 



Following (j21|) . we have, for k = 2, . . . 



n, 



Jk= max logdet(^fc-i^J_i) + log(l+ fv^+ ||^fenfc„i||) ) . (24) 

We wish to show that both terms in (j24|l are individually maximized by the same optimizer 

(/r, ■ ■ ■ , fLi) = max-i (det(5fc.iCi)) 

= max~^||5(fcnfc„i|| (25) 

for k = 2, . . . ,n. Once we establish (j^^ . the desired recursion formula p8|) for Jk follows 
immediately from the definition of Jk and 



We shall prove (|25|l by induction. First note that 



^71 = 0, 

gk = aek-i, A; = 2,3,..., (26) 
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and 



eksl = 1, 

Also recall that Sk = fk + Qk + and 



fc = l,2,.... 



For k = 2, we trivially have 



11^2 Hi 



a^ei ( / 



■^l Si \ rp 



T ] ^1 
SiS{ ' 



a' 1 



1 



T 

SiSl 



«N 1 



1 



det{SiSf] 

which establishes ()25p . Further, from ()23|) and ()24|) . we can check that 

log(sisf) +log 1+1 + I«l/i 



J2 = max 

/i 




+ log 1 




^Po + |aU/l T- 



(27) 
(28) 



Now suppose holds for = 2, . . . , m — 1. For /c > 3, we observe that 



'S'fc-2 



n -1 



'S'fc-2 
Sfc-l 



Sk-2S'[^2 Sk-2sl_i 

qT T 
Sk-lOk-2 ■5fc_iS^_]^ 

— ^ ^ ^k-2 {Sk~2Sk-2) Sk^2 — nfc_2 s]^_i (sfc_i nfc„2 S^.i) Sfc_i Ilk-2 

= nfc._2(/ - nfc„2 (sfc_i nfc._2 s^.J ^Sfc_i nfc_2)nfc_2. 

Now from (|2S1), (EZ|), and (j2Hl), we have 



Wgkii 



k-l\ 



gkUk^igk 

gk^k-2{i - nfc_2 (sfc_i nfc_2 sl_^) ^Sk-i iik-2)'n.k-2gk 

(^I - Ilk-2 sl_i {Sk-l Uk-2 sl_i) ^Sk-1 Uk-2^ Cfc.i 



a Ck-i 



a" [I 



Sk-i nfc_2 Sk-i 



a" [I 



(29) 

1 + (/fe-i + gk^i) nfc.-2 (/fe-i + gk-iYJ ' 

It follows from ((201) - ((221) and JSHI) that, for fixed (/i, . . . , fm-2), both det(5^_i5;^_i) and 
lls'mnm-ill have the same maximizer 

g-m-l ^m-2 



m— 1 



p 



llS'm-l nm-2|| 
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Plugging this back to (j^ . for fixed (/i, . . . , fm-2), "we have 

1 



max ||5'm,nm,_ir = [1 



fm-i \ 1 + (V -Pm-1 + llfl'-m-l nm-2|C 

while 

maxdet(5'm-i5'^_i) = det(5'm_25'^,_2) ' ( 1 + ( V Pm-i + \\g 

Jm-l \ \ / 

But from the induction hypothesis, det(S'm_25'^_2) and ||(yfm_i n.m-2|| have the same max- 
imizer (/i , • • • , /m-2)- Thus det(S'm-i5'^_i) and ||(7.mnm-i|| have the same maximizer 
{fl, . . . , /m-i)- Therefore, we have established ()25p for = m and hence for all k = 2,3, ... . 
From and we easily get the desired recursion formula as 



Jk = Jk^i + log I 1 + I V^fc + l«h/l - I l> A: = 2,3,.... 




S'tej? 3. Optimal power allocation over time. 

In the previous step, we solved the optimization problem (fT^ under a fixed power 
allocation (Pi, . . . ,-P„). Thanks to the special structure of the MA(1) noise process, this 
brute force optimization was tractable via backward dynamic programming. Here we 
optimize the power allocation (Pi, . . . , P„) under the constraint X]r=i — ''^^y 

As we saw earlier, when a = 0, we can use the concavity of the logarithm to show that, 
for all n, 

1 1 " 1 

C„,PB -^-UPu..., P.) = max - J: log(l + P.) = - log(l + P), 

* i=l 

with Pi = ■ ■ ■ = P* = P. When a 7^ 0, it is not tractable to optimize (Pi, . . . , P„) for J„ 
in - (fTH|) to get a closed-form solution of C„,fb for finite n. The following lemma, 
however, enables us to figure out the asymptotically optimal power allocation and to obtain 
a closed-form solution for Cfb = limnC„,FB- 

Lemma 2. Let : [0, 00) x [0, 00) [0, 00) such that the following conditions hold: 
(i) ip{C,,C) continuous, concave in (^,C); ^'''^d strictly concave in ^ for all ( > 0; 
(a) ip{C,,C) increasing in ^ and (, respectively; and 

(Hi) for each C > 0, there is a unique solution > to the equation = il^{^,C). 
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Co = 

Figure 3: Convergence to the unique point 



For some fixed P > 0, let {Pi}'^i be any infinite sequence of nonnegative numbers satisfying 

1 

lim sup — > Pi < P- 



n— >oo IT' . -, 
1=1 



Let {^i}^o defined recursively as 

eo = o, 

6 = ^te-i,p^), ^ = 1,2,.... 

Then 



1 " 

limsup- < i\ 



n—*oo . -, 

1=1 



where ^* = ^*{P) is the unique solution to ^ = ip^^^P). Furthermore, if Pi = P, i 
1,2, ... , then the corresponding $,i converges to C,* ■ 



Proof. Fix e > 0. From the concavity and monotonicity of for n sufficiently large, 

n 1 " 

n ^-^ n ^-^ 

i=l i=l 

\ i=l i=l / 
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Taking limsup on both sides and using the continuity of ip, we have 

n / n \ 

r := hm sup - ^ < Hm sup - ^^-i, P + e]= tPiC*, P + e). 

t=l \ j=l / 

Since e is arbitrary and ip is continuous, we have ^** < iIj{C,**,P). But from uniqueness of 
^* and strict concavity of in ^, we have 

e < r if and only if ^ < ^(e, P). (30) 

Thus C* < C- 

It remains to show that we can actually attain ^* by choosing Pi = P, i = 1,2, ... . Let 
6 = "^(6-15 -P), « = 1, 2, . . . . From the monotonicity of ip{-, P) and (jHUj) . we have 

6-1 < e. = v^te-i,^) < r = v^(r,^), ^ = 1,2,.... 

Thus the sequence {6} has a hmit, which we denote as ^** . But from the continuity of 
ip{-,P), we must have 

r = lim = hm = ^ ( hm ^.^p) = tP{r,P)- 

Thus = T- □ 

We continue our main discussion. Define 



The conditions (i) - (iii) of Lemma 2 can be easily checked. For concavity, we rely on the 
simple composition rule for concave functions [H^ Section 3.2.4] without messy calculus. 
Let Vi(0 = |log(l +0, ^^2(^,0 = iV^+VCf, and MO = - exp(-20). Then 

'^{^X) = ^i(^2('^3(05 C))- Now that ipi is strictly concave and strictly increasing, ip2 is 
concave (strictly concave in ^ alone for each ( > 0) and elementwise strictly increasing, 
and ips is strictly concave, we can conclude that ip is concave in (^ () and strictly concave 
in ^ for all C > 0. Since for any C > 0, ip{0, C) > and ip{^, () c{() < oo as ^ tends to 
infinity, the uniqueness of the root of ^ = () is trivial from the continuity of ijj. 

For an arbitrary infinite sequence {-Pjj^i satisfying 

1 " 

limsup-^/^ < nP, (31) 

n— >oo TT- . -, 
1=1 

we define 

eo = o, 

^i = ij{^,^i,Pi), i = l,2,.... 
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Note that 

= ^ {UPu ...,Pi)- j._i(Pi, . . . , p.-i)) , 2 = 2, 3, ... . 

Now from Lemma 2, we have 

1 1 " 

hmsup — J„(Pi, . . . ,P„) =hmsup-^^i < 

1=1 

where ^* is the unique solution to 

e = P) = ^ log (^1+(vp+ Hy/l- J^j j . 

Since our choice of {-Pi} is arbitrary, we conclude that 

1 1 
sup hmsup — J„(Pi, . . . , P„) = hm —Jn{P, ■■■,P) =C, 

where the supremum (in fact, maximum) is over all infinite sequences {Pi} satisfying the 
asymptotic average power constraint (p?T|l . 

Finally, we prove that Cfb = ■ More specifically, we will show that 

Cfb = lim C„,FB 

= lim max J„(Pi, . . . , P„) (32) 

= sup lim sup ^Jn{Pl, . ■ . ,Pn) (33) 

= C- 

The only subtlety here is how to justify the interchange of the order of limit and supremum 
in and ((221) • It ^asy to verify that 

lim max J„(Pi, . . . , P„) > sup lim sup ^ J„(Pi, . . . , P„), 

for it is always advantageous to choose for each n a finite sequence (Pi,...,P„) with 
Y2^=i Pi — ^P each n rather than fixing a single infinite sequence {Pi} with Pi = P 
for all i. (Recall that the supremum on the right side is achieved by the uniform power 
allocation.) 
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To prove the other direction of inequahty, we fix e > and choose n and (Pj*, . . . , P^^ 
such that 



i=l 



and 

i^J„(P;,...,P:)>CFB-e. (34) 

Now we construct an infinite sequence {Pj}^i by concatenating (P*, . . . ,P*) repeatedly, 
that is, Pkn+i = P* for alH = 1, . . . , n, and /c = 0, 1, . . . . Obviously, this choice of {Pj} sat- 
isfies the power constraint (jSH). As before, let = -Pi), = 1, 2, .... By induction, 
it is easy to see that 

ii<ikn+h i = l,2,...,?7, (35) 

for all /c = 0, 1, . . . . For = 0, (jH^j) holds trivially. Suppose holds for /c = 0, . . . , m — 1. 
Then from the monotonicity of ^'(^i C) have 

^1 = ^(^0, Pi) = IPi^O, Pi) < IPi^mn, P*) = IPi^mn, Pmn+l) = 

6 = V^(6,P2) = ^{^UP2) < i'{.imn+uP2) = i^{^mn+l, Pmn+2) = 

and in general 

6 = ^(^i-l, Pi) = ^(6-1, P*) < 1Pi^mn+i-U P*) = ^(^mn+i-1, Pmn+i) = ^mn+i 

for alH = 1, . . . , n. Thus, ()35j) holds for all k. Therefore 

1 1 1 / " \ 1 

^J,„(Pl,...,P,J = -^e.>^U-^6) =^^(Pl,...,Pn). 

i=l \ i=l / 

which, combined with (jBH), implies that 

limsup ^ J„(Pi, . . . , P„) > Cfb - e, A; = 1, 2, ... , 
which, in turn, implies that 

sup lim sup 7^ J„(Pi, . . . , P„) > Cfb - e- 
Since e is arbitrary, we have the desired inequality. Thus Cfb = C,*- 
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We conclude this section by characterizing the capacity CpB = ^* in an alternative 
form. Recall that ^* is the unique solution to 




Let xq = exp(— ^*), or equivalently, ^* = — logxo- It is easy to verify that < zq < 1 is 
the unique positive solution to 

= 1 + (yp+ laivi-x'^y , 

or equivalently, 

Px^ = (1 - \a\xY. 

This establishes the feedback capacity Cfb of the additive Gaussian noise channel with 
the noise covariance K'^, which is, in turn, the feedback capacity of the first-order moving 
average additive Gaussian noise channel with parameter a, as is argued at the end of Step 
1 and proved in the Appendix. This completes the proof of Theorem 1. 



4 Discussion 



The derived asymptotically optimal feedback input signal sequence, or equivalently, the 
(sequence of) matrices {Ky{n), B*{n)) has two prominent properties. First, the optimal 
{Ky{n),B*{n)) for the ra-block can be found sequentially, built on the optimal {Ky{n — 
1), B*'{n — 1)) for the in — l)-block. Although this property may sound quite natural, it is 
not true in general for other channel models. Later in this section, we will see an MA(2) 
channel counterexample. As a corollary to this sequentiality property, the optimal Ky has 
rank one, which agrees with the previous result by Ordentlich [21]. Secondly, the current 
input signal is orthogonal to the past output signals (Yi, . . . , Yk-\)- In the notation of 
Section 3, we have fkSl_i = 0. This orthogonality property is indeed a necessary condition 
for the optimal {Ky,B*) for any (possibly nonstationary nonergodic) noise covariance 
matrix Kz jBHllSIl- It should be pointed out that the recursion formula (fTH|) - (fTH|) can be 
also derived from the orthogonality property and the optimality of rank-one Ky- 

We explore the possibility of extending the current proof technique to a more general 
class of noise processes. The immediate answer is negative. We comment on two simple 
cases: MA(2) and AR(1). Consider the following MA(2) noise process which is essentially 
two interleaved MA(1) processes: 

Zi = Ui + aUi-2, ^ = 1,2, 
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It is easy to see that this channel has the same capacity as the MA(1) channel with 
parameter a, which can be attained by signalling separately for each interleaved MA(1) 
channel. This suggests that the sequentiality property does not hold for this example. 
Indeed, if we sequentially optimize the n-block capacity, we achieve the rate — logxo, 
where xq is the unique positive root of the sixth order polynomial 

Px^ = (1 - \a\xY- 

It is not difficult to see that this rate is strictly less than the feedback capacity of the 
interleaved MA(1) channel unless a = 0. A similar argument can prove that Butman's 
conjecture on the AR(/c) capacity Abstract] is not true in general for k > 1. 

In contrast to MA(1) channels, we are missing two basic ingredients for AR(1) channels 
— the optimality of rank-one Ky and the asymptotic optimality of the uniform power 
allocation. Under these two conditions, both of which are yet to be justified, it is known 
IT^ that the optimal achievable rate is given by — logxo, where xo is the unique positive 
root of the fourth order polynomial 

Px' 



(1 + |a|x2)2- 

There is, however, a major difficulty in establishing the above two conditions by the 
two-stage optimization strategy we used in the previous section, namely, first maximiz- 
ing (/i, . . . , /„) and then (Pi, . . . , P„). For certain values of individual signal power con- 
straints (Pi, . . . , Pn), the optimal (/i, . . . , does not satisfy the sequentiality, resulting 
in Kv with rank higher than one. Hence, a greedy maximization of log det(S'A;S'^) does 
not establish the recursion formula for the AR(1) n-block capacity that corresponds to our 

(Uni) - (HHD: 

Jo:=0 

Ji = log(l + Pi) 



Jfc+i = Jfc + log 1 + JPk+i + |«| VPfcC — , = 1, 2, 



(See |151 221 ^1 for the derivation of the above recursion formula.) Even under the as- 
sumption that the optimal Ky for the AR(1) channel has rank one, it has been unclear 
whether the uniform power allocation over time is asymptotically optimal. 

Nonetheless, using a technique similar to the one deployed in Lemma 2, we can prove the 
optimality of the uniform power allocation, resolving a question raised by Butman |14| ITH] 
and Tiernan UH] among others. Since the proof is a little technical in nature, we defer it 
to the Appendix. 

Finally we show that the feedback capacity of the MA(1) channel can be achieved 
by using a simple stationary filter of the noise innovation process. Before we proceed, 
we point out that the optimal input process {Xi} we obtained in the previous section is 
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asymptotically stationary. This observation is not hard to prove through the well-developed 
theory on the asymptotic behavior of recursive estimators [^21 Chapter 14]. 



At the beginning, we send^ 



Xi ~ iV(0,P). 



For subsequent transmissions, we transmit the filtered version of the noise innovation 
process up to the time k — 1: 

Xk = /3Xk^^ + aU,^i, A; = 2,3,.... (36) 

In other words, we use a first-order regressive filter with transfer function given by 

Here (3 = — sgn(a) xq with xq being the same unique positive root of the fourth-order 
polynomial © in Theorem 1. The scaling factor a is chosen to satisfy the power constraint 
as 



where 



a = sgn(«)v/P(l-/32) 
J C>0, 



-1, c<o. 

This input process and the MA(1) noise process 

Zk = aUk-i + Uk, k = l,2,..., 
yield the output process given by 

Fi = Xi + aUo + f/i, 
Yk = pXk^i + (a + a)f/fc_i + f/fc, 

= /3Yk-i-a(3Uk-2 + ia- (3 + a)Uk-i, fc = 2,3 

which is asymptotically stationary with power spectral density 



Sy{uj) 



1 + ae' 



-JUJ 



ae 



1 - /3e-> 



1 + (a - /3 + a)e~^^ - a(3e-^^^ 

(1 - [3e-^^) 
(1 + a/32e-J'^)(l ^ 
(1 - (3e-^^) 
/?-2|l + a/5V^"|2. 



(3^ 



■^Technically, we generate 2"^ Xi{W) code functions i.i.d. according to N{Q, P) for some R < Cfb, and 
transmit one of them. 
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The "asymptotic stationarity" here should not bother us since {Y^} is stationary for /c > 2 
and h{Yi\Y2, . . . ,Yn) is uniformly bounded in n; hence the entropy rate of the process 
{YfcjfcLi is determined by (Y2, Ys, . . .). Thus from (fT^ in Section|2l the entropy rate of the 
output process {Yk} is given by 

^ J log i2neSyiu)) du = ^ log(27re/3-2) = ^ log(27rexo 

Hence we attain the feedback capacity Cfb- Furthermore, it can be shown that the mean- 
square error of Xi given the observations Yi, . . . ,Yn decays exponentially with rate = 
22CpB^ In other words, 

Var(Xi|yi, . . . , Fn) = E{Xi - E{Xi\Yi, . . . , Y^))^ = P 2^^'^^^^. (39) 



Note that the optimal filter ()37p has an interesting feature. In the light of ()38p. we 
can think of the output process {Yk} as the filtered version of the noise innovation process 
{Uk} through the monic filter 

1 — ryz~ -\ = — 

As the entropy rate formula ()12|). or more fundamentally, Jensen's formula (lllj) shows, 
the entropy rate of {Y^} is totally determined by all zeros of the filter outside the unit 
circle, which, for our case, is (3~^. Hence, we can interpret the feedback capacity problem 
as the problem of relocating the zero of the original noise filter 1 + az~^ to the outside 
of the unit circle and making the modulus of that zero as large as possible by adding a 
causal filter H{z) using the power (27r)^^ J \H{e^^'^)\'^duj = P. Here we have shown that 
the optimal filter is given by (jHTj) . Under this interpretation, the initial input Xi is merely 
a perturbation which guarantees that the output process is not causally invertible from 
the innovation process and hence that the entropy rate is fully determined by the spectral 
density of the stationary part. (Without Xi, the entropy rate of {Yk} is exactly same as 
the entropy rate of {Z^}.) 

From a classical viewpoint, we can interpret the signal X^ as the adjustment of the 
receiver's estimate of the message-bearing signal Xi after observing (Yi, . . . , Yk-i). We can 
further check that following signalling schemes are equivalent (and thus optimal) up to 
scaling: 



Xk oc X,~X,{Y'~') 

oc X,~X,{Y'-') ij<k) 

oc Uk-i-Uk-iiY'-') 

oc Zk{Y^-\X'^~')-Zk{Y^-^). 

The connection to the Schalkwijk-Kailath coding scheme is now apparent. Recall that 
there is a simple linear relationship [^21 Section 3.4] |67| Section 4.5] between the minimum 
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mean square error estimate (in other words, the minimum variance biased estimate) for the 
Gaussian input Xi and the maximum hkehhood estimate (or equivalently, the minimum 
variance unbiased estimate) for an arbitrary real input 6. Thus we can easily transform 
the above coding scheme based on the asymptotic equipartition property [2] to a variant 
of the Schalkwijk-Kailath linear coding scheme based on the maximum likelihood nearest 
neighborhood decoding of uniformly spaced 2"^ points. More specifically, we send as Xi 
one of 2"-f^ possible signals, say, 9 e Q := {-/P, -/P+A, -/P+2A, . . . , VP-2A, /P- 
A, where A = -^rnzii- Subsequent transmissions follow (jHU]) . The receiver forms the 

maximum likelihood estimate 6n{yi, ■ ■ ■ , ^n) and finds the nearest signal point to 6*^ in 6. 

The analysis of the error for this coding scheme follows Schalkwijk [5| and Butman [T^ . 
From (j39|) and the standard result on the relationship between the minimum variance 
unbiased and biased estimation errors, the maximum likelihood estimation error On — 9 
is, conditioned on 6*, Gaussian with mean 9 and variance exponentially decaying with rate 
= 2^"*^^^. Thus, the nearest neighbor decoding error, ignoring lower order terms, is 
given by 



where 

2 

erfc(x) = —= I exp(— t^)(it, 

and cTg is the variance of input signal 9 chosen uniformly over 6. As far as i? < Cfb, the 
decoding error decays doubly exponentially in n. Note that this coding scheme uses only 
the second moments of the noise process. This implies that the rate Cfb is achievable for 
the additive noise channel with any non-Gaussian noise process with the same covariance 
matrix. 



Pr \9.r, -9\> 



A 



erfc 



2al 



_2n{CYB-R) 



Appendix 

Asymptotic equivalence of Kz and K'^ for feedback capacity 

Recall that ~ ^"^(0, Kz) and ~ ^"^(0, K'^). To stress the dependence of the ca- 
pacity on the power constraint and the noise covariance, we use the notation Cn,-FB{K, P) 
for ra-block feedback capacity of the channel with ra-block noise covariance matrix K un- 
der the power constraint E XliLi -^f — With a little abuse of notation, we similarly 
use C-fb{K,P) for feedback capacity of the channel with infinite noise covariance matrix 
naturally extended from K under the power constraint P. 
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Suppose {B*,Ky) maximizes 

CnMKz, P) = max — log 

2n det(A^) 

and {B**,Ky) maximizes Cn,FB{Kz)- Since K'^ ^ Kz, we have 

tr {B*K'z{B*f + K;,) < tr {B*Kz{B*f + JQ) < nP, 

which shows that {B*, Ky) is a feasible (not necessarily optimal) solution to C„^fb(-^z; P)- 
On the other hand, we have 

{B* + I)K'z{B* + If ^ {B* + I)Kz{B* + J)^, (40) 

so that 

C„,fb(/G, P) = nV^; V'' + (5* + I)Z^)\vr^^NiO,K*,) 

< I{y^- + {B* + /)Z-)|v'.^;v(o,i^^) (41) 

< /(y"; + (5** + /)^")|y"~wr) (42) 

where (14111 follows from ()4()j) . divisibility of the Gaussian distribution, and the data pro- 
cessing inequality Section 2.8]; and ()42|) follows from the optimality of {B**,Ky) for 
Cn,FB{Kz, P) and the feasibility of (B*, i^^) for Cn,-F'B{K'z, P). By letting n tend to infinity, 
we obtain 

lim CnMKz.P) < YimmiCnMK'z.P)- (43) 

For the other direction of inequality, we first consider the case |a| < 1. Fix n and define 
the conditional covariance matrix k'"^\ m = 0, 1, . . . , of conditioned on m past values 
as 

Kf := Kz. 

Kf^ := Cov(Z"|Zo, . . . , Z_„+i), m = 1, 2, . . . . 

It is easy to see that under this notation, the (elementwise) limit of covariance matrices 
K^^'' exists and 

lim K^'^ = K'z. 

m— >oo 

By sending a length-m training sequence over the channel with the noise covariance matrix 
Kz, i.e., by transmitting = ■ ■ ■ = = and then estimating the noise process 

at the receiver using Zq, . . . , Z_rn+i, we can achieve the rate nC„^FB(-^i'"^) over n + m 
transmissions. Hence, we have 

nCnMKt\P) < {n + m)Cn+mMKz,P) 
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for all P. By carefully increasing both n and m, we will derive the desired inequality. 



Consider using {B**, Ky), which is optimal for the channel with noise covariance matrix 
K'^, for the channel with noise covariance K^^\ Since K^^'' ^ K'^, the resulting power 
usage can be greater than nP. However, we have 



tr ( K;* + B**K'z{B**Y + B**{kP - K'z)iB**f 



tr (^k;; + E**/4™^(5**)^) = tr 

< nP + tr (5**(j4™^ - K'z)iB**f 

Now observe that K^^^^ and K'^ differ only at the (1, 1) entry. Furthermore, the convergence 
of K^^\l, 1) = Var(Zi|Zo, . . . , ^m._i) to K'zi^-, 1) = Var(Zi|Zo, Z_i, . . .) is exponentially 
fast in m (uniformly in n). Hence, we can bound the amount of additional power usage as 

tr (5"(j4-) - K',){B**r) < max (B*;)' max (4™) - K',{m)) 



ner 



where c is a constant independent of n and m. Combining above observations, we have 
the following chain of inequalities for all n and m: 

{n + m)Cn+m,FB{Kz, P + 

>nCnMKt\P + en,m) 



> 
> 



2 

1 

2 L 



log det ( K*y* + (/ + E**)4'"^ {I + B 



log det 4""^ 



logdet (K{; + (/ + B**)K'z{I + B**f) - logdet K 



(m) 
Z 



nCnMK'z, ^) + 2 [log det /i^ - log det iq 



(m) 



(44) 



Finally we let n and m grow to infinity such that 

and n^e-"" 



m 
n 



0. 



The inequality (j44|) certainly implies that 

lim CnM^z, P + e)> limsup CnM^'z^ P) 



for every e > 0. The desired inequality follows from the continuity of the Cfb(-^Z;-P) in 



For the case \a\ = 1, we can perturb the noise process using a negligible amount of power 
and proceeds similarly as above. Indeed, if we perturb the original covariance matrices K'^ 
and Kz into the perturbed covariance matrices K'z{e) and Kzi^) that correspond to the 
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MA(1) process with parameter a(l — e), we have 

hmsupC„,FB(i^z,^) < CMK'z{e),P + Siie)) (45) 

= CMKz{e),P + 6^{e)) (46) 
< Cfb ((1 + S2{e)r'Kz, P + 53(e)) (47) 
= Cfb (A'z,(l + 52(e))(P + 53(e))), 

where (I45|) follows because we can transform the channel K'^{e) into K'^ using very small 
power, (j46p follows from the result for |a| < 1 we obtained above, and (j47|) follows since 
we can perturb the channel (1 + 62{€))~^Kz into -ft'z(e) by adding some extra white noise. 
Since 5fc(e) — * as e 0, /c = 1, 2, 3, and Cfb{Kz, P) is continuous in P, we have 

limsupQFB(/^z,^) < Cfb{Kz,P). 

n— >oo 

This completes the proof of the asymptotic equivalence of Kz and K'2;. □ 



Optimality of uniform power allocation for the Schalkwijk-Kailath-Butman 
coding scheme for the AR(1) Gaussian feedback channel. 

Recall that for the AR(1) Gaussian feedback channel, the best n-block achievable rate 
Rn of the Schalkwijk-Kailath-Butman coding scheme, or equivalently, the best achievable 
rate over all Ky with rank one, is given by 

Rn = max ^Jn{Pi, . . . , P„), 



where 



Jo := (48) 
Ji = log(l + Pi) (49) 

Jfc+i = .h + log + {yP^i + |«| v^e-^"^)'^ , = 1, 2, . . . . (50) 



We wish to show that 



lim Rn = lim ^Jn{P, • • • , P) = - log Xq, 

n— >oo n— >oo ZjTI 



where xq is the unique positive root of the fourth order polynomial 

(1 + \a\x) 



P^" = 7T-T^2- (51) 



Define 



</)(e,Ci,C2) = ^log(l + (v^+ |«|v^e-^)2), e,Ci,C2 > 0, 
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and 

^(e,c) = 0(e,c,c), e,c>o. 

It is easy to check the foUowings: 

(i) 4>{C,,CiX2) is increasing and concave in (Ci,C2); 

(ii) for each Ci, C2 > 0, Ci, C2) is a decreasing contraction of ^ in the sense that 

0(ei,Ci,C2)-0(e2,Ci,C2)<e2-6 

for all ^1 and ^2; and consequently, 

(iii) for each C > 0, there is a unique solution ^*(C) to the equation ^ = ip{^, () such that 

^(e, > e for all e < r(C) and < e for all e > r(C)- 

For an arbitrary infinite sequence {-Pjj^o with Pq = and 

1 " 

limsup-^Pi < P, (52) 

n— >oo ''T' . -, 
1=1 

we define 

^0 = 0, 

6 = 0te-l,P^,P.-l), 2 = 1,2,.... 

Then we can rewrite the recursion formula ()48j) - as 

= ^ (J,(Pi, . . . , Pi) - J._i(Pi, . . . , P.-i)) , 2 = 2, 3, ... , 

and we have 

1 1 " 

— -'n(Pl, • • • , Pn) = ~ / ^6- 

272 77, 

i=l 

Now we show that 

1 " 

r :=limsup-^ei<r, 

n— >oo . -, 

1=1 
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where ^* = ^*{P) is the unique solution to the equation ^ = ip{^,P). Indeed, 




1 



1 



n / n ^ n \ 

n ^ — ^ \ n ^-^ n ^-^ ] 



where the first inequahty follows from the aforementioned property (ii) and the second 
inequality follows from the property (i) and Jensen's inequality. By taking limits on both 
sides, we get from continuity of Ci, C2) in (Ci) C2) 



which, from the property (iii), implies that < We can also check that letting = P 
for all z = 1, 2, . . . attains ^** = ^* from the property (ii) and the principle of contraction 
mappings [HHl Section 14]. (See Figure El below and the detailed analysis in [121 Section 5].) 
Thus, we conclude that the supremum of limsup^^^^ (2n)~^ J„(Pi, . . . , P„) over all infinite 
power sequences {Pi} satisfying the power constraint (j52|) is achieved by the uniform power 
allocation. From simple change of variable Xq = exp(— ,^*), we can easily verify 



where < Xq < 1 is the unique positive solution to (jHH). 

As in the MA(1) case before, it remains to justify the interchange of the order of limit 
and supremum in 



r <0(r,^,^) = v^(r,^), 



r = - log xo 



n- 



lim P 



00 




sup lim sup — J„(Pi, . . . , Pn) 




Obviously we have 



lim max — 

n^ooPi-r£.P,<nP 2n 



Jn{Pl,...,Pn) > lim — J„(P,...,P). 
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Figure 4: Convergence to the unique point 



For the other direction of inequahty, fix n and take {Pi, . . . , Pn-i) that achieves Rn-i- We 
construct the infinite sequence {Pi}'^i by concatenating (Pj", . . . , P*_i, 0) repeatedly, that 
is, Pkn+i = P*, I < i < n — 1, k = 0,1, . . . , and P^n = for all /c = 1, . . . . Now we can 
easily verify that 

Jfc„(Pi, ...,Pkn)= kJn-i{Pl, = 2k{n - l)Rn-l. 

(Taking P^^ = resets the dependence on the past.) By taking limits on both sides, we 
get 

Tl 1 

lim Rn-i = lim 777^ Jfcn(-pL, • • • , Pkn) 

< lim sup ^Jn{Pl, ■■■,Pn) 

< hm ^UP,...,P). 

This completes the proof of the asymptotic optimality of the uniform power allocation. □ 
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